Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells432430
Missing cells (%)8.1%8.0%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh correlation
Age has 92 (20.6%) missing values Age has 85 (19.1%) missing values Missing
Cabin has 339 (76.0%) missing values Cabin has 345 (77.4%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 295 (66.1%) zeros SibSp has 305 (68.4%) zeros Zeros
Parch has 348 (78.0%) zeros Parch has 338 (75.8%) zeros Zeros
Fare has 7 (1.6%) zeros Alert not present in this datasetZeros

Reproduction

 Dataset ADataset B
Analysis started2025-03-21 10:46:16.0221182025-03-21 10:46:18.270670
Analysis finished2025-03-21 10:46:18.2675402025-03-21 10:46:20.503093
Duration2.25 seconds2.23 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean441.31614458.9574
 Dataset ADataset B
Minimum13
Maximum890891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:46:20.605533image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum13
5-th percentile51.2555.25
Q1225.5248
median441.5468
Q3646.75664.75
95-th percentile848.75847.5
Maximum890891
Range889888
Interquartile range (IQR)421.25416.75

Descriptive statistics

 Dataset ADataset B
Standard deviation252.61733254.09038
Coefficient of variation (CV)0.572418060.55362519
Kurtosis-1.1311608-1.1602973
Mean441.31614458.9574
Median Absolute Deviation (MAD)213209.5
Skewness0.033611435-0.075809285
Sum196827204695
Variance63815.51664561.92
MonotonicityNot monotonicNot monotonic
2025-03-21T10:46:20.751983image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
371 1
 
0.2%
629 1
 
0.2%
587 1
 
0.2%
708 1
 
0.2%
181 1
 
0.2%
73 1
 
0.2%
352 1
 
0.2%
520 1
 
0.2%
164 1
 
0.2%
766 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
461 1
 
0.2%
420 1
 
0.2%
687 1
 
0.2%
885 1
 
0.2%
320 1
 
0.2%
354 1
 
0.2%
238 1
 
0.2%
427 1
 
0.2%
163 1
 
0.2%
402 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
9 1
0.2%
12 1
0.2%
13 1
0.2%
17 1
0.2%
18 1
0.2%
19 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
5 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
16 1
0.2%
19 1
0.2%
21 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
5 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
16 1
0.2%
19 1
0.2%
21 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
9 1
0.2%
12 1
0.2%
13 1
0.2%
17 1
0.2%
18 1
0.2%
19 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
271 
1
175 
0
259 
1
187 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row00
2nd row00
3rd row10
4th row01
5th row00

Common Values

ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 259
58.1%
1 187
41.9%

Length

2025-03-21T10:46:20.853892image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-21T10:46:20.902802image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:20.936515image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 259
58.1%
1 187
41.9%

Most occurring characters

ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 259
58.1%
1 187
41.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 259
58.1%
1 187
41.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 259
58.1%
1 187
41.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 259
58.1%
1 187
41.9%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
240 
1
110 
2
96 
3
249 
1
117 
2
80 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row33
2nd row23
3rd row13
4th row31
5th row23

Common Values

ValueCountFrequency (%)
3 240
53.8%
1 110
24.7%
2 96
 
21.5%
ValueCountFrequency (%)
3 249
55.8%
1 117
26.2%
2 80
 
17.9%

Length

2025-03-21T10:46:20.991174image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-21T10:46:21.042852image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:21.087420image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
3 240
53.8%
1 110
24.7%
2 96
 
21.5%
ValueCountFrequency (%)
3 249
55.8%
1 117
26.2%
2 80
 
17.9%

Most occurring characters

ValueCountFrequency (%)
3 240
53.8%
1 110
24.7%
2 96
 
21.5%
ValueCountFrequency (%)
3 249
55.8%
1 117
26.2%
2 80
 
17.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 240
53.8%
1 110
24.7%
2 96
 
21.5%
ValueCountFrequency (%)
3 249
55.8%
1 117
26.2%
2 80
 
17.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 240
53.8%
1 110
24.7%
2 96
 
21.5%
ValueCountFrequency (%)
3 249
55.8%
1 117
26.2%
2 80
 
17.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 240
53.8%
1 110
24.7%
2 96
 
21.5%
ValueCountFrequency (%)
3 249
55.8%
1 117
26.2%
2 80
 
17.9%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:46:21.459739image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length5049
Mean length27.07623327.067265
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1207612072
Distinct characters6060
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowBostandyeff, Mr. GuentchoVan Impe, Miss. Catharina
2nd rowJarvis, Mr. John DenzilPanula, Mr. Jaako Arnold
3rd rowCalderhead, Mr. Edward PenningtonSutehall, Mr. Henry Jr
4th rowSage, Miss. Constance GladysSpedden, Mrs. Frederic Oakley (Margaretta Corning Stone)
5th rowHood, Mr. Ambrose JrArnold-Franchi, Mr. Josef
ValueCountFrequency (%)
mr 255
 
13.9%
miss 89
 
4.9%
mrs 70
 
3.8%
william 31
 
1.7%
john 27
 
1.5%
master 22
 
1.2%
henry 20
 
1.1%
thomas 15
 
0.8%
james 11
 
0.6%
mary 11
 
0.6%
Other values (895) 1282
69.9%
ValueCountFrequency (%)
mr 253
 
13.9%
miss 90
 
4.9%
mrs 78
 
4.3%
william 25
 
1.4%
john 23
 
1.3%
master 15
 
0.8%
charles 14
 
0.8%
henry 14
 
0.8%
thomas 13
 
0.7%
george 12
 
0.7%
Other values (906) 1288
70.6%
2025-03-21T10:46:22.008200image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1388
 
11.5%
r 969
 
8.0%
e 865
 
7.2%
a 832
 
6.9%
s 656
 
5.4%
i 648
 
5.4%
n 633
 
5.2%
M 575
 
4.8%
l 528
 
4.4%
o 524
 
4.3%
Other values (50) 4458
36.9%
ValueCountFrequency (%)
1381
 
11.4%
r 972
 
8.1%
e 850
 
7.0%
a 847
 
7.0%
s 663
 
5.5%
i 661
 
5.5%
n 656
 
5.4%
M 559
 
4.6%
l 559
 
4.6%
o 504
 
4.2%
Other values (50) 4420
36.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12076
100.0%
ValueCountFrequency (%)
(unknown) 12072
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1388
 
11.5%
r 969
 
8.0%
e 865
 
7.2%
a 832
 
6.9%
s 656
 
5.4%
i 648
 
5.4%
n 633
 
5.2%
M 575
 
4.8%
l 528
 
4.4%
o 524
 
4.3%
Other values (50) 4458
36.9%
ValueCountFrequency (%)
1381
 
11.4%
r 972
 
8.1%
e 850
 
7.0%
a 847
 
7.0%
s 663
 
5.5%
i 661
 
5.5%
n 656
 
5.4%
M 559
 
4.6%
l 559
 
4.6%
o 504
 
4.2%
Other values (50) 4420
36.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12076
100.0%
ValueCountFrequency (%)
(unknown) 12072
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1388
 
11.5%
r 969
 
8.0%
e 865
 
7.2%
a 832
 
6.9%
s 656
 
5.4%
i 648
 
5.4%
n 633
 
5.2%
M 575
 
4.8%
l 528
 
4.4%
o 524
 
4.3%
Other values (50) 4458
36.9%
ValueCountFrequency (%)
1381
 
11.4%
r 972
 
8.1%
e 850
 
7.0%
a 847
 
7.0%
s 663
 
5.5%
i 661
 
5.5%
n 656
 
5.4%
M 559
 
4.6%
l 559
 
4.6%
o 504
 
4.2%
Other values (50) 4420
36.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12076
100.0%
ValueCountFrequency (%)
(unknown) 12072
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1388
 
11.5%
r 969
 
8.0%
e 865
 
7.2%
a 832
 
6.9%
s 656
 
5.4%
i 648
 
5.4%
n 633
 
5.2%
M 575
 
4.8%
l 528
 
4.4%
o 524
 
4.3%
Other values (50) 4458
36.9%
ValueCountFrequency (%)
1381
 
11.4%
r 972
 
8.1%
e 850
 
7.0%
a 847
 
7.0%
s 663
 
5.5%
i 661
 
5.5%
n 656
 
5.4%
M 559
 
4.6%
l 559
 
4.6%
o 504
 
4.2%
Other values (50) 4420
36.6%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
285 
female
161 
male
278 
female
168 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.72197314.7533632
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21062120
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalefemale
2nd rowmalemale
3rd rowmalemale
4th rowfemalefemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%
ValueCountFrequency (%)
male 278
62.3%
female 168
37.7%

Length

2025-03-21T10:46:22.102146image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-21T10:46:22.155517image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:22.189331image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male 285
63.9%
female 161
36.1%
ValueCountFrequency (%)
male 278
62.3%
female 168
37.7%

Most occurring characters

ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 614
29.0%
m 446
21.0%
a 446
21.0%
l 446
21.0%
f 168
 
7.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2120
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 614
29.0%
m 446
21.0%
a 446
21.0%
l 446
21.0%
f 168
 
7.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2120
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 614
29.0%
m 446
21.0%
a 446
21.0%
l 446
21.0%
f 168
 
7.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2106
100.0%
ValueCountFrequency (%)
(unknown) 2120
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 607
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 161
 
7.6%
ValueCountFrequency (%)
e 614
29.0%
m 446
21.0%
a 446
21.0%
l 446
21.0%
f 168
 
7.9%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7268
Distinct (%)20.3%18.8%
Missing9285
Missing (%)20.6%19.1%
Infinite00
Infinite (%)0.0%0.0%
Mean29.97810729.676122
 Dataset ADataset B
Minimum0.420.42
Maximum7171
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:46:22.287471image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.42
5-th percentile44
Q12121
median3028
Q33838
95-th percentile54.52557
Maximum7171
Range70.5870.58
Interquartile range (IQR)1717

Descriptive statistics

 Dataset ADataset B
Standard deviation14.4423814.50826
Coefficient of variation (CV)0.481764250.48888666
Kurtosis-0.0387635890.077829294
Mean29.97810729.676122
Median Absolute Deviation (MAD)98
Skewness0.207090790.31967258
Sum10612.2510713.08
Variance208.58235210.48961
MonotonicityNot monotonicNot monotonic
2025-03-21T10:46:22.433321image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
19 16
 
3.6%
22 15
 
3.4%
30 14
 
3.1%
36 13
 
2.9%
28 13
 
2.9%
17 11
 
2.5%
27 11
 
2.5%
33 11
 
2.5%
31 11
 
2.5%
24 11
 
2.5%
Other values (62) 228
51.1%
(Missing) 92
20.6%
ValueCountFrequency (%)
21 17
 
3.8%
24 17
 
3.8%
30 14
 
3.1%
35 14
 
3.1%
25 13
 
2.9%
26 12
 
2.7%
36 11
 
2.5%
19 11
 
2.5%
18 11
 
2.5%
22 11
 
2.5%
Other values (58) 230
51.6%
(Missing) 85
 
19.1%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 4
0.9%
4 5
1.1%
5 2
 
0.4%
6 2
 
0.4%
7 3
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 2
 
0.4%
1 5
1.1%
2 6
1.3%
3 2
 
0.4%
4 5
1.1%
5 1
 
0.2%
7 2
 
0.4%
8 3
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 2
 
0.4%
1 5
1.1%
2 6
1.3%
3 2
 
0.4%
4 5
1.1%
5 1
 
0.2%
7 2
 
0.4%
8 3
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 4
0.9%
4 5
1.1%
5 2
 
0.4%
6 2
 
0.4%
7 3
0.7%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.524663680.48206278
 Dataset ADataset B
Minimum00
Maximum88
Zeros295305
Zeros (%)66.1%68.4%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:46:22.527361image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile22
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.02247630.98910526
Coefficient of variation (CV)1.94882232.0518184
Kurtosis15.75501318.600935
Mean0.524663680.48206278
Median Absolute Deviation (MAD)00
Skewness3.37306133.6610549
Sum234215
Variance1.04545780.97832922
MonotonicityNot monotonicNot monotonic
2025-03-21T10:46:22.598602image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 295
66.1%
1 115
 
25.8%
2 14
 
3.1%
4 11
 
2.5%
3 7
 
1.6%
8 2
 
0.4%
5 2
 
0.4%
ValueCountFrequency (%)
0 305
68.4%
1 109
 
24.4%
2 13
 
2.9%
4 7
 
1.6%
3 7
 
1.6%
5 3
 
0.7%
8 2
 
0.4%
ValueCountFrequency (%)
0 295
66.1%
1 115
 
25.8%
2 14
 
3.1%
3 7
 
1.6%
4 11
 
2.5%
5 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0 305
68.4%
1 109
 
24.4%
2 13
 
2.9%
3 7
 
1.6%
4 7
 
1.6%
5 3
 
0.7%
8 2
 
0.4%
ValueCountFrequency (%)
0 305
68.4%
1 109
 
24.4%
2 13
 
2.9%
3 7
 
1.6%
4 7
 
1.6%
5 3
 
0.7%
8 2
 
0.4%
ValueCountFrequency (%)
0 295
66.1%
1 115
 
25.8%
2 14
 
3.1%
3 7
 
1.6%
4 11
 
2.5%
5 2
 
0.4%
8 2
 
0.4%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct66
Distinct (%)1.3%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.318385650.39910314
 Dataset ADataset B
Minimum00
Maximum55
Zeros348338
Zeros (%)78.0%75.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:46:22.819476image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum55
Range55
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.678380820.8519108
Coefficient of variation (CV)2.1306892.134563
Kurtosis7.86582129.7245775
Mean0.318385650.39910314
Median Absolute Deviation (MAD)00
Skewness2.50176722.814778
Sum142178
Variance0.460200530.725752
MonotonicityNot monotonicNot monotonic
2025-03-21T10:46:22.887985image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 348
78.0%
1 60
 
13.5%
2 35
 
7.8%
5 1
 
0.2%
4 1
 
0.2%
3 1
 
0.2%
ValueCountFrequency (%)
0 338
75.8%
1 59
 
13.2%
2 40
 
9.0%
5 5
 
1.1%
4 2
 
0.4%
3 2
 
0.4%
ValueCountFrequency (%)
0 348
78.0%
1 60
 
13.5%
2 35
 
7.8%
3 1
 
0.2%
4 1
 
0.2%
5 1
 
0.2%
ValueCountFrequency (%)
0 338
75.8%
1 59
 
13.2%
2 40
 
9.0%
3 2
 
0.4%
4 2
 
0.4%
5 5
 
1.1%
ValueCountFrequency (%)
0 338
75.8%
1 59
 
13.2%
2 40
 
9.0%
3 2
 
0.4%
4 2
 
0.4%
5 5
 
1.1%
ValueCountFrequency (%)
0 348
78.0%
1 60
 
13.5%
2 35
 
7.8%
3 1
 
0.2%
4 1
 
0.2%
5 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct383376
Distinct (%)85.9%84.3%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:46:23.288091image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.677136.7556054
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters29783013
Distinct characters3532
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique333319 ?
Unique (%)74.7%71.5%

Sample

 Dataset ADataset B
1st row349224345773
2nd row2375653101295
3rd rowPC 17476SOTON/OQ 392076
4th rowCA. 234316966
5th rowS.O.C. 14879349237
ValueCountFrequency (%)
pc 28
 
5.0%
c.a 15
 
2.7%
a/5 9
 
1.6%
sc/paris 6
 
1.1%
w./c 5
 
0.9%
soton/o.q 5
 
0.9%
c 5
 
0.9%
2 4
 
0.7%
ca 4
 
0.7%
a/4 4
 
0.7%
Other values (401) 476
84.8%
ValueCountFrequency (%)
pc 33
 
5.8%
c.a 17
 
3.0%
ca 6
 
1.1%
soton/oq 6
 
1.1%
ston/o 6
 
1.1%
2 6
 
1.1%
ston/o2 4
 
0.7%
a/4 4
 
0.7%
1601 4
 
0.7%
3101295 4
 
0.7%
Other values (392) 476
84.1%
2025-03-21T10:46:23.804230image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 394
13.2%
1 345
11.6%
2 287
9.6%
7 264
8.9%
4 229
7.7%
6 221
 
7.4%
0 206
 
6.9%
5 198
 
6.6%
9 143
 
4.8%
8 127
 
4.3%
Other values (25) 564
18.9%
ValueCountFrequency (%)
3 378
12.5%
1 357
11.8%
2 296
9.8%
7 248
8.2%
4 220
 
7.3%
6 218
 
7.2%
5 193
 
6.4%
0 192
 
6.4%
9 165
 
5.5%
8 132
 
4.4%
Other values (22) 614
20.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2978
100.0%
ValueCountFrequency (%)
(unknown) 3013
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 394
13.2%
1 345
11.6%
2 287
9.6%
7 264
8.9%
4 229
7.7%
6 221
 
7.4%
0 206
 
6.9%
5 198
 
6.6%
9 143
 
4.8%
8 127
 
4.3%
Other values (25) 564
18.9%
ValueCountFrequency (%)
3 378
12.5%
1 357
11.8%
2 296
9.8%
7 248
8.2%
4 220
 
7.3%
6 218
 
7.2%
5 193
 
6.4%
0 192
 
6.4%
9 165
 
5.5%
8 132
 
4.4%
Other values (22) 614
20.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2978
100.0%
ValueCountFrequency (%)
(unknown) 3013
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 394
13.2%
1 345
11.6%
2 287
9.6%
7 264
8.9%
4 229
7.7%
6 221
 
7.4%
0 206
 
6.9%
5 198
 
6.6%
9 143
 
4.8%
8 127
 
4.3%
Other values (25) 564
18.9%
ValueCountFrequency (%)
3 378
12.5%
1 357
11.8%
2 296
9.8%
7 248
8.2%
4 220
 
7.3%
6 218
 
7.2%
5 193
 
6.4%
0 192
 
6.4%
9 165
 
5.5%
8 132
 
4.4%
Other values (22) 614
20.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2978
100.0%
ValueCountFrequency (%)
(unknown) 3013
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 394
13.2%
1 345
11.6%
2 287
9.6%
7 264
8.9%
4 229
7.7%
6 221
 
7.4%
0 206
 
6.9%
5 198
 
6.6%
9 143
 
4.8%
8 127
 
4.3%
Other values (25) 564
18.9%
ValueCountFrequency (%)
3 378
12.5%
1 357
11.8%
2 296
9.8%
7 248
8.2%
4 220
 
7.3%
6 218
 
7.2%
5 193
 
6.4%
0 192
 
6.4%
9 165
 
5.5%
8 132
 
4.4%
Other values (22) 614
20.4%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct184181
Distinct (%)41.3%40.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean31.02707436.49358
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros73
Zeros (%)1.6%0.7%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:46:23.928448image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.225
Q17.9258.05
median13.860415.5
Q330.6468531.3875
95-th percentile108.28125134.5
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)22.7218523.3375

Descriptive statistics

 Dataset ADataset B
Standard deviation45.06883360.602504
Coefficient of variation (CV)1.45256471.6606346
Kurtosis34.78673127.308604
Mean31.02707436.49358
Median Absolute Deviation (MAD)6.468757.7646
Skewness4.70288664.5682468
Sum13838.07516276.137
Variance2031.19973672.6634
MonotonicityNot monotonicNot monotonic
2025-03-21T10:46:24.080206image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 26
 
5.8%
13 24
 
5.4%
7.8958 23
 
5.2%
7.75 17
 
3.8%
26 15
 
3.4%
10.5 11
 
2.5%
7.25 9
 
2.0%
7.925 8
 
1.8%
0 7
 
1.6%
8.6625 7
 
1.6%
Other values (174) 299
67.0%
ValueCountFrequency (%)
8.05 22
 
4.9%
7.8958 21
 
4.7%
13 17
 
3.8%
26 16
 
3.6%
7.75 14
 
3.1%
7.925 11
 
2.5%
10.5 10
 
2.2%
7.225 9
 
2.0%
26.55 9
 
2.0%
7.2292 8
 
1.8%
Other values (171) 309
69.3%
ValueCountFrequency (%)
0 7
1.6%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.75 1
 
0.2%
6.975 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.125 1
 
0.2%
ValueCountFrequency (%)
0 3
0.7%
5 1
 
0.2%
6.4375 1
 
0.2%
6.75 2
0.4%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 3
0.7%
5 1
 
0.2%
6.4375 1
 
0.2%
6.75 2
0.4%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 7
1.6%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.75 1
 
0.2%
6.975 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.125 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct9085
Distinct (%)84.1%84.2%
Missing339345
Missing (%)76.0%77.4%
Memory size7.0 KiB7.0 KiB
2025-03-21T10:46:24.460868image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.59813083.7227723
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters385376
Distinct characters1818
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7671 ?
Unique (%)71.0%70.3%

Sample

 Dataset ADataset B
1st rowE24E34
2nd rowC128G6
3rd rowD11F33
4th rowC91D26
5th rowC50B51 B53 B55
ValueCountFrequency (%)
b96 4
 
3.2%
b98 4
 
3.2%
f33 3
 
2.4%
f 3
 
2.4%
b18 2
 
1.6%
e44 2
 
1.6%
e25 2
 
1.6%
e67 2
 
1.6%
e8 2
 
1.6%
d36 2
 
1.6%
Other values (91) 98
79.0%
ValueCountFrequency (%)
c23 4
 
3.3%
c25 4
 
3.3%
c27 4
 
3.3%
d26 2
 
1.7%
f33 2
 
1.7%
b51 2
 
1.7%
b53 2
 
1.7%
b55 2
 
1.7%
g6 2
 
1.7%
e25 2
 
1.7%
Other values (86) 95
78.5%
2025-03-21T10:46:24.906577image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 42
 
10.9%
B 39
 
10.1%
3 32
 
8.3%
C 31
 
8.1%
2 26
 
6.8%
6 26
 
6.8%
8 24
 
6.2%
0 22
 
5.7%
7 20
 
5.2%
9 18
 
4.7%
Other values (8) 105
27.3%
ValueCountFrequency (%)
C 42
11.2%
2 42
11.2%
B 33
 
8.8%
3 30
 
8.0%
5 30
 
8.0%
6 26
 
6.9%
1 25
 
6.6%
20
 
5.3%
4 19
 
5.1%
7 19
 
5.1%
Other values (8) 90
23.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 385
100.0%
ValueCountFrequency (%)
(unknown) 376
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 42
 
10.9%
B 39
 
10.1%
3 32
 
8.3%
C 31
 
8.1%
2 26
 
6.8%
6 26
 
6.8%
8 24
 
6.2%
0 22
 
5.7%
7 20
 
5.2%
9 18
 
4.7%
Other values (8) 105
27.3%
ValueCountFrequency (%)
C 42
11.2%
2 42
11.2%
B 33
 
8.8%
3 30
 
8.0%
5 30
 
8.0%
6 26
 
6.9%
1 25
 
6.6%
20
 
5.3%
4 19
 
5.1%
7 19
 
5.1%
Other values (8) 90
23.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 385
100.0%
ValueCountFrequency (%)
(unknown) 376
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 42
 
10.9%
B 39
 
10.1%
3 32
 
8.3%
C 31
 
8.1%
2 26
 
6.8%
6 26
 
6.8%
8 24
 
6.2%
0 22
 
5.7%
7 20
 
5.2%
9 18
 
4.7%
Other values (8) 105
27.3%
ValueCountFrequency (%)
C 42
11.2%
2 42
11.2%
B 33
 
8.8%
3 30
 
8.0%
5 30
 
8.0%
6 26
 
6.9%
1 25
 
6.6%
20
 
5.3%
4 19
 
5.1%
7 19
 
5.1%
Other values (8) 90
23.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 385
100.0%
ValueCountFrequency (%)
(unknown) 376
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 42
 
10.9%
B 39
 
10.1%
3 32
 
8.3%
C 31
 
8.1%
2 26
 
6.8%
6 26
 
6.8%
8 24
 
6.2%
0 22
 
5.7%
7 20
 
5.2%
9 18
 
4.7%
Other values (8) 105
27.3%
ValueCountFrequency (%)
C 42
11.2%
2 42
11.2%
B 33
 
8.8%
3 30
 
8.0%
5 30
 
8.0%
6 26
 
6.9%
1 25
 
6.6%
20
 
5.3%
4 19
 
5.1%
7 19
 
5.1%
Other values (8) 90
23.9%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing10
Missing (%)0.2%0.0%
Memory size7.0 KiB7.0 KiB
S
314 
C
88 
Q
43 
S
313 
C
100 
Q
33 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSS
3rd rowSS
4th rowSC
5th rowSS

Common Values

ValueCountFrequency (%)
S 314
70.4%
C 88
 
19.7%
Q 43
 
9.6%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 313
70.2%
C 100
 
22.4%
Q 33
 
7.4%

Length

2025-03-21T10:46:24.989179image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-21T10:46:25.039325image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:25.081700image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
s 314
70.6%
c 88
 
19.8%
q 43
 
9.7%
ValueCountFrequency (%)
s 313
70.2%
c 100
 
22.4%
q 33
 
7.4%

Most occurring characters

ValueCountFrequency (%)
S 314
70.6%
C 88
 
19.8%
Q 43
 
9.7%
ValueCountFrequency (%)
S 313
70.2%
C 100
 
22.4%
Q 33
 
7.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 314
70.6%
C 88
 
19.8%
Q 43
 
9.7%
ValueCountFrequency (%)
S 313
70.2%
C 100
 
22.4%
Q 33
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 314
70.6%
C 88
 
19.8%
Q 43
 
9.7%
ValueCountFrequency (%)
S 313
70.2%
C 100
 
22.4%
Q 33
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 314
70.6%
C 88
 
19.8%
Q 43
 
9.7%
ValueCountFrequency (%)
S 313
70.2%
C 100
 
22.4%
Q 33
 
7.4%

Interactions

Dataset A

2025-03-21T10:46:17.679348image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.803729image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:16.287113image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:18.524409image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:16.601543image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:18.836624image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:16.921860image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.159324image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:17.259242image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.491752image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:17.739868image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.863136image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:16.348305image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:18.583006image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:16.665552image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:18.899775image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:16.985985image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.225758image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:17.431165image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.551827image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:17.807495image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.926226image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:16.411504image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:18.647032image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:16.730936image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:18.965507image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:17.053869image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.289723image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:17.491622image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.613980image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:17.875312image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.991211image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:16.479943image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:18.716940image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:16.794656image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.029769image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:17.124573image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.358966image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:17.555485image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.682616image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:17.940641image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:20.054330image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:16.542117image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:18.776609image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:16.857904image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.093782image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:17.191349image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.426431image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-21T10:46:17.614697image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:19.741198image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

Dataset A

2025-03-21T10:46:25.135007image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-21T10:46:25.241277image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0850.115-0.3140.0380.2590.000-0.2060.118
Embarked0.0851.0000.2000.0000.0000.2650.1050.0880.220
Fare0.1150.2001.0000.3670.0130.4910.2160.4480.332
Parch-0.3140.0000.3671.0000.0260.0000.2170.4020.153
PassengerId0.0380.0000.0130.0261.0000.0000.051-0.0550.101
Pclass0.2590.2650.4910.0000.0001.0000.1600.1660.397
Sex0.0000.1050.2160.2170.0510.1601.0000.1950.537
SibSp-0.2060.0880.4480.402-0.0550.1660.1951.0000.195
Survived0.1180.2200.3320.1530.1010.3970.5370.1951.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0000.155-0.2080.0540.2390.098-0.1810.192
Embarked0.0001.0000.2050.0840.0000.2520.0660.0000.138
Fare0.1550.2051.0000.395-0.0930.4630.2290.4160.322
Parch-0.2080.0840.3951.000-0.0510.0290.2680.3670.149
PassengerId0.0540.000-0.093-0.0511.0000.0000.051-0.0730.103
Pclass0.2390.2520.4630.0290.0001.0000.1770.0920.366
Sex0.0980.0660.2290.2680.0510.1771.0000.2040.552
SibSp-0.1810.0000.4160.367-0.0730.0920.2041.0000.166
Survived0.1920.1380.3220.1490.1030.3660.5520.1661.000

Missing values

Dataset A

2025-03-21T10:46:18.043857image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2025-03-21T10:46:20.300405image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2025-03-21T10:46:18.128191image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2025-03-21T10:46:20.383922image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2025-03-21T10:46:18.221167image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2025-03-21T10:46:20.465096image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
62862903Bostandyeff, Mr. Guentchomale26.0003492247.8958NaNS
58658702Jarvis, Mr. John Denzilmale47.00023756515.0000NaNS
70770811Calderhead, Mr. Edward Penningtonmale42.000PC 1747626.2875E24S
18018103Sage, Miss. Constance GladysfemaleNaN82CA. 234369.5500NaNS
727302Hood, Mr. Ambrose Jrmale21.000S.O.C. 1487973.5000NaNS
35135201Williams-Lambert, Mr. Fletcher FellowsmaleNaN0011351035.0000C128S
51952003Pavlovic, Mr. Stefomale32.0003492427.8958NaNS
16316403Calic, Mr. Jovomale17.0003150938.6625NaNS
76576611Hogeboom, Mrs. John C (Anna Andrews)female51.0101350277.9583D11S
38738812Buss, Miss. Katefemale36.0002784913.0000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
41942003Van Impe, Miss. Catharinafemale10.000234577324.1500NaNS
68668703Panula, Mr. Jaako Arnoldmale14.0041310129539.6875NaNS
88488503Sutehall, Mr. Henry Jrmale25.0000SOTON/OQ 3920767.0500NaNS
31932011Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone)female40.001116966134.5000E34C
35335403Arnold-Franchi, Mr. Josefmale25.001034923717.8000NaNS
23723812Collyer, Miss. Marjorie "Lottie"female8.0002C.A. 3192126.2500NaNS
42642712Clarke, Mrs. Charles V (Ada Maria Winfield)female28.0010200326.0000NaNS
16216303Bengtsson, Mr. John Viktormale26.00003470687.7750NaNS
40140203Adams, Mr. Johnmale26.00003418268.0500NaNS
46947013Baclini, Miss. Helene Barbarafemale0.7521266619.2583NaNC

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
35435503Yousif, Mr. WazlimaleNaN0026477.2250NaNC
28628713de Mulder, Mr. Theodoremale30.0003457749.5000NaNS
13413502Sobey, Mr. Samuel James Haydenmale25.000C.A. 2917813.0000NaNS
24324403Maenpaa, Mr. Matti Alexanterimale22.000STON/O 2. 31012757.1250NaNS
73273302Knight, Mr. Robert JmaleNaN002398550.0000NaNS
18318412Becker, Master. Richard Fmale1.02123013639.0000F4S
52152203Vovk, Mr. Jankomale22.0003492527.8958NaNS
86886903van Melkebeke, Mr. PhilemonmaleNaN003457779.5000NaNS
84484503Culumovic, Mr. Jesomale17.0003150908.6625NaNS
37037111Harder, Mr. George Achillesmale25.0101176555.4417E50C

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
62963003O'Connell, Mr. Patrick DmaleNaN003349127.7333NaNQ
14714803Ford, Miss. Robina Maggie "Ruby"female9.022W./C. 660834.3750NaNS
25825911Ward, Miss. Annafemale35.000PC 17755512.3292NaNC
76676701Brewe, Dr. Arthur JacksonmaleNaN0011237939.6000NaNC
66066111Frauenthal, Dr. Henry Williammale50.020PC 17611133.6500NaNS
58758811Frolicher-Stehli, Mr. Maxmillianmale60.0111356779.2000B41C
69469501Weir, Col. Johnmale60.00011380026.5500NaNS
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NaNS
85485502Carter, Mrs. Ernest Courtenay (Lilian Hughes)female44.01024425226.0000NaNS
46046111Anderson, Mr. Harrymale48.0001995226.5500E12S

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.